AITopics | missing data imputation

Collaborating Authors

missing data imputation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GenerativeForests

Neural Information Processing SystemsFeb-10-2026, 18:43:09 GMT

We focus on generative AI for a type of data that stillrepresent one of the most prevalentformofdata: tabulardata.

artificial intelligence, justification, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Generative Forests

Neural Information Processing SystemsOct-9-2025, 22:35:33 GMT

We use this now common parlance expression on purpose, to avoid confusion with the other "generative"

data imputation, experiment, generative forest, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
(3 more...)

Add feedback

Rethinking the Diffusion Models for Missing Data Imputation: A Gradient Flow Perspective

Neural Information Processing SystemsMay-27-2025, 16:34:13 GMT

Diffusion models have demonstrated competitive performance in missing data imputation (MDI) task. However, directly applying diffusion models to MDI produces suboptimal performance due to two primary defects. First, the sample diversity promoted by diffusion models hinders the accurate inference of missing values. Second, data masking reduces observable indices for model training, obstructing imputation performance. To handle the first defect, we incorporate a negative entropy regularization term into the cost functional to suppress diversity and improve accuracy. To handle the second defect, we demonstrate that the imputation procedure of NewImp, induced by the conditional distribution-related cost functional, can equivalently be replaced by that induced by the joint distribution, thereby naturally eliminating the need for data masking.

gradient flow perspective, missing data imputation, underline, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Missing data imputation for noisy time-series data and applications in healthcare

Le, Lien P., Thi, Xuan-Hien Nguyen, Nguyen, Thu, Riegler, Michael A., Halvorsen, Pål, Nguyen, Binh T.

arXiv.org Artificial IntelligenceDec-15-2024

Healthcare time series data is vital for monitoring patient activity but often contains noise and missing values due to various reasons such as sensor errors or data interruptions. Imputation, i.e., filling in the missing values, is a common way to deal with this issue. In this study, we compare imputation methods, including Multiple Imputation with Random Forest (MICE-RF) and advanced deep learning approaches (SAITS, BRITS, Transformer) for noisy, missing time series data in terms of MAE, F1-score, AUC, and MCC, across missing data rates (10 % - 80 %). Our results show that MICE-RF can effectively impute missing data compared to deep learning methods and the improvement in classification of data imputed indicates that imputation can have denoising effects. Therefore, using an imputation algorithm on time series with missing data can, at the same time, offer denoising effects.

data quality, imputation, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.11164

Country:

Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Nonparametric End-to-End Probabilistic Forecasting of Distributed Generation Outputs Considering Missing Data Imputation

Chen, Minghui, Meng, Zichao, Liu, Yanping, Luo, Longbo, Guo, Ye, Wang, Kang

arXiv.org Artificial IntelligenceMar-31-2024

In this paper, we introduce a nonparametric end-to-end method for probabilistic forecasting of distributed renewable generation outputs while including missing data imputation. Firstly, we employ a nonparametric probabilistic forecast model utilizing the long short-term memory (LSTM) network to model the probability distributions of distributed renewable generations' outputs. Secondly, we design an end-to-end training process that includes missing data imputation through iterative imputation and iterative loss-based training procedures. This two-step modeling approach effectively combines the strengths of the nonparametric method with the end-to-end approach. Consequently, our approach demonstrates exceptional capabilities in probabilistic forecasting for the outputs of distributed renewable generations while effectively handling missing values. Simulation results confirm the superior performance of our approach compared to existing alternatives.

forecast model, forecasting, imputation, (13 more...)

arXiv.org Artificial Intelligence

2404.00729

Country:

Asia > China > Guangdong Province > Guangzhou (0.05)
Asia > China > Guangdong Province > Shenzhen (0.05)
Oceania > Australia > South Australia (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)

Genre: Research Report (0.70)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable > Wind (0.70)
Energy > Renewable > Solar (0.46)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FCMI: Feature Correlation based Missing Data Imputation

Mishra, Prateek, Mani, Kumar Divya, Johri, Prashant, Arya, Dikhsa

arXiv.org Artificial IntelligenceJun-26-2021

Processed data are insightful, and crude data are obtuse. A serious threat to data reliability is missing values. Such data leads to inaccurate analysis and wrong predictions. We propose an efficient technique to impute the missing value in the dataset based on correlation called FCMI (Feature Correlation based Missing Data Imputation). We have considered the correlation of the attributes of the dataset, and that is our central idea. Our proposed algorithm picks the highly correlated attributes of the dataset and uses these attributes to build a regression model whose parameters are optimized such that the correlation of the dataset is maintained. Experiments conducted on both classification and regression datasets show that the proposed imputation technique outperforms existing imputation algorithms.

algorithm, correlation, dataset, (12 more...)

arXiv.org Artificial Intelligence

2107.001

Country:

Asia > India > NCT > New Delhi (0.04)
Asia > India > NCT > Delhi (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.53)

Add feedback

Reviewing Autoencoders for Missing Data Imputation: Technical Trends, Applications and Outcomes

Cardoso Pereira, Ricardo (University of Coimbra) | Seoane Santos, Miriam (University of Coimbra) | Pereira Rodrigues, Pedro (University of Porto) | Henriques Abreu, Pedro (University of Coimbra)

Journal of Artificial Intelligence ResearchDec-14-2020

Missing data is a problem often found in real-world datasets and it can degrade the performance of most machine learning models. Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able to learn a representation of the data with missing values and generate plausible new ones to replace them. This study surveys the use of Autoencoders for the imputation of tabular data and considers 26 works published between 2014 and 2020. The analysis is mainly focused on discussing patterns and recommendations for the architecture, hyperparameters and training settings of the network, while providing a detailed discussion of the results obtained by Autoencoders when compared to other state-of-the-art methods, and of the data contexts where they have been applied. The conclusions include a set of recommendations for the technical settings of the network, and show that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.

autoencoder, dataset, imputation, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.12312

AI Access Foundation

12312

Journal of Artificial Intelligence Research

Country:

Europe > Portugal > Coimbra > Coimbra (0.05)
Europe > Portugal > Porto > Porto (0.04)
Asia > China (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Missing data imputation in machine learning pipelines

#artificialintelligenceDec-12-2020, 07:05:36 GMT

Machine learning is an important part of working in R. Packages like mlr3 simplify the whole process. Its no need to manually split data into training and test set, no need to manually fit linear models. Even more, packages like mlr3pipelines let you crate complex pipelines and include feature engineering and transformation in your models. R is also used by statisticians, from statisticians we have advanced methods of imputing missing data like mice or Amelia. What happens when we want to connect machine learning with a statistical approach.

imputation, missing data imputation, nadia, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Missing Data Imputation using Optimal Transport

Muzellec, Boris, Josse, Julie, Boyer, Claire, Cuturi, Marco

arXiv.org Machine LearningFeb-26-2020

Missing data is a crucial issue when applying machine learning algorithms to real-world datasets. Starting from the simple assumption that two batches extracted randomly from the same dataset should share the same distribution, we leverage optimal transport distances to quantify that criterion and turn it into a loss function to impute missing data values. We propose practical methods to minimize these losses using end-to-end learning, that can exploit or not parametric assumptions on the underlying distributions of values. We evaluate our methods on datasets from the UCI repository, in MCAR, MAR and MNAR settings. These experiments show that OT-based methods match or out-perform state-of-the-art imputation methods, even for high percentages of missing values.

dataset, imputation, missing data imputation, (12 more...)

arXiv.org Machine Learning

2002.0386

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Missing Data Imputation for Classification Problems

Choudhury, Arkopal, Kosorok, Michael R.

arXiv.org Machine LearningFeb-25-2020

Imputation of missing data is a common application in various classification problems where the feature training matrix has missingness. A widely used solution to this imputation problem is based on the lazy learning technique, $k$-nearest neighbor (kNN) approach. However, most of the previous work on missing data does not take into account the presence of the class label in the classification problem. Also, existing kNN imputation methods use variants of Minkowski distance as a measure of distance, which does not work well with heterogeneous data. In this paper, we propose a novel iterative kNN imputation technique based on class weighted grey distance between the missing datum and all the training data. Grey distance works well in heterogeneous data with missing instances. The distance is weighted by Mutual Information (MI) which is a measure of feature relevance between the features and the class label. This ensures that the imputation of the training data is directed towards improving classification performance. This class weighted grey kNN imputation algorithm demonstrates improved performance when compared to other kNN imputation algorithms, as well as standard imputation algorithms such as MICE and missForest, in imputation and classification problems. These problems are based on simulated scenarios and UCI datasets with various rates of missingness.

algorithm, dataset, imputation, (9 more...)

arXiv.org Machine Learning

2002.10709

Country:

North America > United States > North Carolina > Orange County > Chapel Hill (0.14)
North America > United States > New York (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
(3 more...)

Add feedback